Solving the Linear Bellman Equation via Dual Kernel Embeddings

نویسندگان

  • Yunpeng Pan
  • Xinyan Yan
  • Bo Dai
  • Le Song
  • Evangelos Theodorou
  • Byron Boots
چکیده

We introduce a data-efficient approach for solving the linear Bellman equation, which corresponds to a class of Markov decision processes (MDPs) and stochastic optimal control (SOC) problems. We show that this class of control problem can be cast as a stochastic composition optimization problem, which can be further reformulated as a saddle point problem and solved via dual kernel embeddings [1]. Our method is model-free and using only one sample per state transition from stochastic dynamical systems. Different from related work such as Z-learning [2, 3] based on temporal-difference learning [4], our method is an online algorithm following the true stochastic gradient. Numerical results are provided, showing that our method outperforms the Z-learning algorithm.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Wavelet‎-based numerical ‎method‎ ‎‎‎‎for solving fractional integro-differential equation with a weakly singular ‎kernel

This paper describes and compares application of wavelet basis and Block-Pulse functions (BPFs) for solving fractional integro-differential equation (FIDE) with a weakly singular kernel‎. ‎First‎, ‎a collocation method based on Haar wavelets (HW)‎, ‎Legendre wavelet (LW)‎, ‎Chebyshev wavelets (CHW)‎, ‎second kind Chebyshev wavelets (SKCHW)‎, ‎Cos and Sin wavelets (CASW) and BPFs are presented f...

متن کامل

Hilbert Space Embeddings of POMDPs

A nonparametric approach for policy learning for POMDPs is proposed. The approach represents distributions over the states, observations, and actions as embeddings in feature spaces, which are reproducing kernel Hilbert spaces. Distributions over states given the observations are obtained by applying the kernel Bayes’ rule to these distribution embeddings. Policies and value functions are defin...

متن کامل

The solving linear one-dimemsional Volterra integral equations of the second kind in reproducing kernel space

In this paper, to solve a linear one-dimensional Volterra  integral equation of the second kind. For this purpose using the equation form, we have defined a linear transformation and by using it's conjugate and reproducing kernel functions, we obtain a basis for the functions space.Then we obtain the solution of  integral equation in terms of the basis functions. The examples presented in this ...

متن کامل

Kernel-Based Reinforcement Learning Using Bellman Residual Elimination

This paper presents a class of new approximate policy iteration algorithms for solving infinite-horizon, discounted Markov decision processes (MDPs) for which a model of the system is available. The algorithms are similar in spirit to Bellman residual minimization methods. However, by exploiting kernel-based regression techniques with nondegenerate kernel functions as the underlying cost-to-go ...

متن کامل

An Interior Point Algorithm for Solving Convex Quadratic Semidefinite Optimization Problems Using a New Kernel Function

In this paper, we consider convex quadratic semidefinite optimization problems and provide a primal-dual Interior Point Method (IPM) based on a new kernel function with a trigonometric barrier term. Iteration complexity of the algorithm is analyzed using some easy to check and mild conditions. Although our proposed kernel function is neither a Self-Regular (SR) fun...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017